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Abstract 

The abihty to characterize the color content of natural imagery is an 
important application of image processing. The pixel by pixel coloring of 
images may be viewed naturally as points in color space, and the inherent 
structure and distribution of these points affords a quantization, through 
clustering, of the color information in the image. In this paper, we present 
a novel topologically driven clustering algorithm that permits segmentation 
of the color features in a digital image. The algorithm blends Locally Linear 
Embedding (LLE) and vector quantization by mapping color information to 
a lower dimensional space, identifying distinct color regions, and classify- 
ing pixels together based on both a proximity measure and color content. 
It is observed that these techniques permit a significant reduction in color 
resolution while maintaining the visually important features of images. 
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1. Introduction 

Manifold learning in data analysis assumes that a set of observations, 
taken as a whole, is locally well approximated by a topological (or even ge- 
ometric) manifold. This assumption implies that the data is locally well 
approximated by a linear space, i.e., it is locally fiat. A fundamental goal 
of manifold learning is to uncover the underlying structure of this approxi- 
mating manifold and to find low dimensional representations that preserve 
the structure and topology of the original data set optimally [Ij, ^2], [L3j. A 
frequent simplifying assumption is that the local dimension is constant over 
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the entire data set. Alternatively, one may model a set of data as a collec- 
tion of manifolds, allowing for intersections and for variations in dimension. 
For instance, the union of the xy-plane and the z-axis is not a manifold but 
decomposes naturally as a union of two manifolds of differing dimension. We 
have found this multiple manifold assumption to be appropriate for natural 
imagery consisting of distinct objects, e.g., a landscape image consisting of 
flowers, cacti, and ground vegetation. 

Points in a data set can typically be thought of as lying close to a low 
dimensional manifold if the points are parameterized by a relatively small 
number of continuous variables [4], [1], [5j. For instance, a manifold struc- 
ture could underlie a collection of images of a single object undergoing a 
change of state (such as illumination, pose, scale, translation, etc.). One 
way to uncover this structure is to map the collection to a high dimensional 
vector space by considering each image as a point with dimensionality cor- 
responding to the number of pixels in the image and with coordinate values 
corresponding to the brightness of each pixel j3], [6], [2]. Many algorithms 
have been implemented on such data sets in order to uncover a low dimen- 
sional manifold that reflects the inherent structure of the high dimensional 
data. Linear methods such as Principal Component Analysis [7J and Mul- 
tidimensional Scaling [8j have been around for many years while nonlinear 
methods such as ISOMAP [2], Locally Linear Embedding [3j, Hessian Eigen- 
maps [5j, and Laplacian Eigenmaps [S] are more recent and have proven 
capable of extracting highly nonlinear embeddings. 

In this paper, we focus on the Locally Linear Embedding (LLE) algorithm 
applied at the pixel level. More precisely, our data sets do not consist of a 
set of images but rather the pixels comprising a single image. Analysis of 
LLE applied to pixels has been implemented previously in and has been 
considered in the context of hyperspectral images by [H] , [12] , [13] . These 
works conflrm the existence of an underlying structure. The goal of this 
paper is to utilize LLE to represent the underlying structure of color data in 
an image as a union of linear spaces and to quantize color space accordingly. 
While examples will be drawn from the color space of digital images within 
the visible spectrum, it is important to note that such images are a special 
case of hyperspectral imagery. Implementations in one setting can typically 
be implemented in the more general setting with minor modiflcations. 

An object under a flxed illumination condition, as perceived by the human 
eye, is often represented by considering a particular map to obtained by 
integrating, at each small region of an object, the product of the spectral 
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reflectance curve against three particular frequency response curves. We 
refer to these three functional as maps to red, green, and blue (RGB) space 
[14j. As a result of this map, a digital photograph typically represents a 
given object /illumination pair as an ^4 x 5 x 3 data array where the flrst 
two coordinates record the location in the image and the last coordinate 
records the values of the red, green and blue functional on the associated 
spectral reflectance curve. By combining the three A x B color sheets, one 
can well approximate the human perception of the object /illumination pair, 
see Figure [TJ 




Figure 1: Illustration of the red, green, and blue sheets associated to pixels 
of an image. 



The data we will be considering consists oi Ax B x3 arrays corresponding 
to digital pictures of natural imagery. The entries in the three A x B sheets 
correspond to the energy near the red, green, or blue frequencies at each 
pixel. Each color component of a pixel is an integral value between and 
255 (corresponding to an eight bit representation). Each pixel is associated 
to a point in representing the (RGB) color of the pixel. As there are three 
components of each color with 256 possible choices each, there are 256^ = 
16, 777, 216 distinct colors that can be represented. The color content of an 
image can typically be quantized at a much coarser level (while maintaining 
most of the visual information) by allowing each pixel's color to be identifled 
with a prototype as determined by a quantization algorithm. Much work has 
been done to quantize the color space of natural imagery. The wide variety of 
approaches include statistical-based, graph theoretical, clustering, gradient 
descent, among many other techniques jT5], [6], [l6], [TTj, jH], [19], [20], [21], 

m, [24J. 

In the LLE algorithm, a weighted graph is flrst constructed from a set 
of data as a stand-in for the local manifold structure [3]. The algorithm 
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next determines a set of d embedding vectors by discarding the eigenvector 
corresponding to the smaUest eigenvalue of an associated graph Laplacian 
and keeping the 2^^ through d+ eigenvectors. Arranging the d vectors as 
columns of a matrix, the rows of this matrix provide a map of the original 
data to M^. The graph Laplacian encodes the number of connected compo- 
nents of the graph as the dimension of its null space. Interpreting results 
in the LLE algorithm becomes problematic if the null space has dimension 
greater than one as eigenvectors in a null space are unique only up to ro- 
tation. Thus, a canonical ordering of eigenvectors is ill defined. As it is 
quite reasonable to expect the color space of natural images to be lying on 
multiple manifolds, it is also natural to expect multiple connected compo- 
nents amongst the union of the manifolds. In order to alleviate this problem, 
we perturb the graph Laplacian in the direction of a circulant graph Lapla- 
cian to reduce the co-rank to one. From the d-dimensional embedding, we 
apply a technique for proximity /color segmentation. This is done through 
an unsupervised clustering algorithm that exploits the topology preserving 
properties of LLE. 

Put another way, natural images are not random; they have structure in 
their color space in that adjacent pixels tend to have similar color values. 
These piecewise continuous variations lead to a piecewise manifold approx- 
imating the data. Through LLE, this piecewise manifold is revealed as a 
piecewise linear manifold. The segmentation is accomplished by uncovering 
the principal direction of an epsilon ball of points and segmenting the data 
such that points determined to be close enough to this principal vector and 
similarly colored to the center of the epsilon ball are classified together and 
removed from the data. This iterative approach has proven robust in the 
presence of noise, with the input parameters refiecting the accuracy of seg- 
mentation desired. Thus, the algorithm exploits the transformation of local 
one-manifold structure to local linear structure in the mapped data. 

In Section [2} we present a brief overview of the Locally Linear Embed- 
ding algorithm and present the graph Laplacian perturbation to reduce to the 
case of co-rank one. Section [3] discusses the algorithm in conjunction with 
color quantization. We present an example to observe that the geometric 
structure of a color image is revealed in a reconstruction image by exploit- 
ing locally-linear variations in pixel space through subspace segmentation. 
Section |4] demonstrates the algorithm's ability to reduce the color space of 
natural imagery and uses this algorithm in conjunction with the classical 
Linde-Buzo-Gray vector quantization algorithm [25], [21j in the context of 
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a landscape ecology application. The contributions of this paper include 
a method for resolving LLE rank issue problems (without carrying out a 
decomposition into connected components) in such a manner that the lo- 
cal topological structure of the data is preserved, a technique for subspace 
segmentation, and the development of an associated clustering algorithm. 

2. Connecting Components in Locally Linear Embedding 

2.L Locally Linear Embedding Algorithm 

The Locally Linear Embedding (LLE) algorithm [3j is an unsupervised 
dimensionality reduction algorithm that determines a mapping of data, lying 
in a higher dimensional vector space, to a lower dimensional vector space 
while optimizing the maintenance of local spatial relationships within the 
data. Through this map, the LLE algorithm uncovers a lower dimensional 
representation of the data with the goal of preserving the topology and neigh- 
borhood structure of the original higher dimensional data. The first step of 
the algorithm requires a criterion to determine the nearest neighbors of each 
data point. The second step is to associate to each neighbor a weight. This 
weight is calculated by solving a least squares problem that minimizes a cer- 
tain reconstruction error e{W). More precisely, if denotes the i^^ data 
point from a set of p points and A^^ denotes the indices of its set of nearest 
neighbors, one determines the values of Wij that minimize the expression 

p 

The final step of the algorithm is to determine a set of lower dimensional 
vectors, y^, that minimize the function 

i=i jeNi 

Let W denote the p x p matrix containing the weights Wij (padded out with 
zeros). The ^^'s are found by solving the eigenvector problem MY^ = Y^A 
where M = I - W - + W^W = (/ - Wf (/ - T^), A is the diagonal 
matrix of Lagrange multipliers, and the i^^ row of Y^ corresponds to yi. 

If our data set corresponds to a sampling of a manifold and if this sampling 
is sufficiently dense, then a fundamental assumption of the algorithm is that 
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each data point and its nearest neighbors can be characterized by a locaUy 
hnear patch of the manifold, hence the name LocaUy Linear Embedding. 
Data points that were close together in the original higher dimensional space 
should still be close together after mapped to lie in the lower dimensional 
space thus preserving the topology of the original data set. Details of the 



LLE implementation can be found in [Appendix A 



2.2. Example 

This section consists of an example illustrating the LLE algorithm's topol- 
ogy preserving capabilities. The data set consists of images of a black square 
translated over a background of random noise. The black square is allowed 
to split and "wrap" around each boundary edge of the background. To con- 
struct the data set, we start with a 20 x 20 matrix consisting of random 
entries between and 1. Within this matrix, a 10 x 10 zero matrix is super- 
imposed. The data set is generated by considering all positions of the 10 x 10 
matrix inside the 20 x 20 matrix of random noise (allowing both horizontal 
and vertical wrapping). Thus from a topological point of view, the data set 
corresponds to a noisy sampling of a torus in M^^^. We defined the nearest 
neighbors, of each element in the data set, to be the 4 nearest data points 
in M^^^. The LLE algorithm was then used to map the data to M^. The 
resulting embedded data in is displayed in Figure [2] and refiects the orig- 
inal topological structure quite clearly. In general, results from experiments 




Figure 2: A plot of the embedding vectors obtained by LLE of images as 
described above. 



suggest that LLE can indeed be successful in its goal of a nonlinear dimen- 
sionality reduction that captures inherent topological properties through a 
single, linear algebra derived, map. Due to its effectiveness, we have chosen 
to exploit this reduction in the context of a clustering algorithm. 
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2.3. Special Considerations in Implementation of LLE on Natural Imagery 
As natural images have the feature that many pixel colors are quite simi- 
lar, it would not be surprising to find pixels with identical colors. Thus, when 
we consider an image as a collection of points in M^, we may observe distinct 
image pixels whose distance apart is zero. In determining nearest neighbors 
for a point x^, we have opted to only include points whose distance from Xi 
is greater than zero. We have observed that for many natural images, the 
k nearest neighbor's graph has corank larger than 1 indicating more than 
one connected component. For disconnected data, LLE can be implemented 
on each of the graph's connected components separately [3j. In this paper 
we have chosen a different (and slightly unusual) path in that we artificially 
connect components by perturbing in the direction of a cycle. 

2.4' Connecting Disconnected Components 

The Laplacian of a graph has as an eigenvalue with multiplicity equal to 
the number of connected components of the graph j26j. In a similar manner, 
the matrix M, in the final step of the LLE algorithm, has co-rank corre- 
sponding to the number of connected components within the data set where 
connections are made by linking each data point with its nearest neighbors. 
Choosing the number of nearest neighbors to be small can lead to many dis- 
connected components. As previously stated, while ^ indicates that LLE 
be implemented on each of the graph's connected components separately, 
we have chosen to proceed down a diflFerent path by artificially connecting 
previously disconnected components through a perturbation (much like the 
second step in LLE that adds a regularization term to the covariance matrix 
C that would be singular in the case when k > D). Here, we perturb M 
in the direction of a matrix T that has a similar structure to M in that it 
is positive semidefinite with row sums equal to 0. We chose T to be the 
Laplacian matrix of a cycle, i.e. 



T 
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Thus, in practice, we consider = M + XT where A is a scalar. Mat lab 
experiments suggest that using A = 10~^ produces a matrix that is ar- 
tificially connected (i.e. has a corank of 1). As T comes from a cycle, the 
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eigenvectors of M' inherit this circular structure while retaining much of the 
original topology of the pixel data. 

3. Locally Linear Embedding Clustering 

In this section, we consider a novel way to extend the LLE algorithm to a 
clustering algorithm. In natural imagery, the variation in color hues of neigh- 
boring pixels is often slight. Furthermore, pixels which are not spatially close 
may also exhibit very slight color variations. We can associate these slight 
color changes with continuous variations of pixel colors in an approximating 
manifold. Topological structure associated with these continuous variations 
is revealed by uncovering multiple underlying manifolds. These manifolds are 
detected using the Locally Linear Embedding algorithm. Segmenting based 
on these manifolds allows for quantization of the color space, leading to a 
clustering algorithm. 

3.1. Clustering 

Data clustering is the name given to creating groups of objects, or clusters, 
such that objects in one cluster have a shared set of features whereas objects 
in different clusters have less similarity with respect to these features. Clus- 
tering is a fundamental approach to segmenting data. There are many ways 
to attach pairwise similarity scores to a set of data points and it is important 
to realize that the clusters could have very different properties and/or shapes 
depending on these scores. Clustering can be done in a hierarchical manner 
where clusters are determined by using previously established clusters or in a 
partitioning manner where the data is clustered simultaneously into disjoint 
sets. In semi-supervised or constrained clustering algorithms, additional in- 
formation such as data labels, information about the clusters themselves, 
etc. is available and utilized [27], ^5j. Unsupervised clustering algorithms, 
in which data is organized without any information like data labels, however, 
can identify major characteristics or patterns without any supervision from 
the user. The algorithm of this paper, described in the following subsections, 
is non-hierarchical and unsupervised. 

3.2. Why Locally Linear Embedding? 

The Locally Linear Embedding algorithm, as discussed in Section [2] and 
[Appendix A| is a dimensionality reduction algorithm that can help uncover 
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Figure 3: Images generated by Pattern Analysis Laboratory at Colorado 
State University where an individual remains motionless and the illumination 
of the surrounding area varies. 



the inherent structure and topology of higher dimensional data by deter- 
mining a map to a lower dimensional space that optimizes for neighborhood 
relationships. While LLE was intended for the purpose of revealing topo- 
logical structure, the creators of this algorithm indicate in [3j that some of 
the ideas presented in LLE, namely the first and third steps, are similar to 
those of the Normalized Cut algorithm discussed in [28j and other clustering 
methods such as the one discussed by [24j. 

Through experimentation, it was observed that if a natural image, consid- 
ered as a set of RGB color points, is embedded in using nearest neighbor 
sets of size 4 and a perturbation matrix T with A = 10~^ (as discussed in 



subsection 2.4), then the data lies on a relatively small collection of lines. 



When the inherent color of each of the higher-dimensional input vectors was 
superimposed on the corresponding reconstruction vectors, it was noticed 
that similar colors fell along the same line. We observed simpler (but sim- 
ilar) behavior when considering a fixed pixel in a set of images of a fixed 
object under changing ambient illumination conditions. In each of these two 
cases, the lines in the reduced space suggest a method for quantization. 

The following example uses a set of images from the Pattern Analysis 
Laboratory (PAL) database at Colorado State University. In the data, an 
individual remained motionless as the ambient illumination conditions were 
altered (with lights of fixed spectral characteristics). Figure |3] shows three 
such images with diflFerent illumination conditions. A data set was formed by 
considering the RGB values of a single, fixed pixel extracted from 200 such 
images. 

The LLE algorithm was implemented on this data set and mapped into 
a 2-dimensional space using = 4 nearest neighbors. The null space of M 
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(a) Data in (b) Data in k=8 (c) Data in R^ k=4 

Figure 4: Plots of 3D data points generated by extracting the RGB values 
of a single pixel for each of 200 images and their corresponding embedding 
vectors as reconstructed by the LLE algorithm using k=8 nearest neighbors, 
and k=4 nearest neighbors with 'connected' data points. 



turned out to be 4-dimensional indicating 4 connected components. Thus, 
our choice of nearest neighbor has artificially disconnected this data set that 
intuitively should be connected given that it is the smooth variation of illu- 
mination at a fixed point. Therefore, either we need to increase the number 
of nearest neighbors or perturb the data in such a way that the data is re- 
connected. Note that the smallest value of k that yields a 1-dimensional null 
space for M is = 8. 

In Figure [4| we see the original plot of the RGB data points, a plot of the 
embedding vectors using k = 8 nearest neighbors, and a plot of the embed- 
ding vectors using = 4 nearest neighbors with the perturbation discussed 
in Section 2A^ that artificially reconnects the data set. We observe that 
the original three dimensional data appears relatively linear. Using k = 8 
nearest neighbors, there is a degradation of this linear structure. However, 
using k = 4: nearest neighbors, with M perturbed by T, preserves the linear 
structure at a local level. 

Therefore, we have observed two important properties of LLE. First, if 
a data set is disconnected using a choice of k nearest neighbors, it can be 
artificially reconnected using an appropriate perturbation. Provided the per- 
turbation is not too extreme, this does not aflFect the local topology of the 
data as expressed by LLE, as the linear structure of similar colors falling 
along one dimensional subspaces holds for each component. Second, the 
LLE algorithm is able to uncover the gradation of hue or variance of illumi- 
nation within an image which corresponds to a linear structure in the plot of 
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the reconstruction image of an RGB color space. Using this hnear structure, 
in which each data point of the reconstruction image is colored according 
to its corresponding high-dimensional data point, a color space clustering 
algorithm is obtained. 

Essentially, the Locally Linear Embedding Clustering (LLEC) algorithm 
segments the distinct linear manifolds that appear in the reconstruction plot 
of the LLE algorithm and then identifies which data points lie close to which 
subspace. The RGB color information is then overlaid onto the reconstruc- 
tion data and further used to segment the data by clustering similarly colored 
points together. 

3.3. Subspace Segmentation 

We propose a subspace segmentation technique in the LLEC algorithm 
that involves selecting a point, y*, in the reconstruction data and then con- 
structing an epsilon ball of appropriate size centered around this point. The 
point y* may be chosen randomly or with a more deterministic criterion 
such as those discussed in [Appendix B[ A data matrix, is formed by 



inserting each embedding vector falling inside this epsilon ball into the rows 
of the matrix. The singular value decomposition, A = UT>V^ ^ is computed 
to determine the right singular vector corresponding to the largest singular 
value. This singular vector corresponds to the line that passes through the 
mean and minimizes the sum of the square Euclidean distances between the 
points in the epsilon ball and the line, indicating the principal direction of 
these data points [7J. This unveils the 1-dimensional subspace that we are 
after. Note a method for fc-dimensional subspace segmentation is discussed 
below. In order to segment the data via its proximity to each subspace, we 
then determine which points y- in the data set satisfy ||y^ — PyJ| < ci where 
€1 is some tolerance to identify those points that are 'close enough to' the 
subspace in consideration, and P is the projection onto the first right singular 
vector. 

Now, as many lines overlap or intersect, simply using proximity to the 
subspace will not yield an appropriate segmentation. Thus, the data is fur- 
ther segmented by identifying those points, y^, whose RGB color values, x^, 
are most similar to the color of y*, x*, by computing ||x^ — x*|| < 62 where 
€2 is again some tolerance to identify which points are 'similarly' colored to 
y*. Points that are identified as being 'close enough to' the subspace gener- 
ated by the random point and 'similarly' colored to this point are identified 
together and removed from the data set. The process is repeated until all 
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1, Randomly select a pointy* 
in the reconstruction data, 
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2, Compute the SVD of all 
points within an epsilon ball 
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Figure 5: Illustration of subspace segmentation algorithm. 



points have been identified with a distinct subspace using both proximity 
and color information. Thus, we obtain a clustering based on both color and 
spatial proximity metrics (in the 2-D representation). 

Figure [5] provides a step-by-step illustration of this subspace segmentation 
algorithm. Also refer to Figure |6] to see an actual implementation of this 
algorithm. Observe that the subimages represent the subspaces iteratively 
removed from the reconstruction data. 

We have described the approach for 1-D subspace segmentation, but this 
method can be used for multiple dimensions as well by considering pixels suf- 
ficiently correlated with the first several principal components. Thus, if an 
m-dimensional subspace segmentation is desired, the m right singular vectors 
corresponding to the m largest singular values form the principal directions 
of the data and span the subspace we want to uncover. Once this subspace 
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(a) Original (b) 2D Reconstruction 
after LLE 
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Figure 6: Subspace segmentation of 2D reconstruction data using LLEC with 
accuracy tolerances of approximately 0.4 by initializing y* as the point whose 
b^^ nearest neighbor has the smallest distance. 
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is uncovered, the rest of the approach described above is analogous for a 
multidimensional subspace segmentation. Note that this subspace segmenta- 
tion technique has proven robust in the case of noise whereas methods such 
as Generalized Principal Component Analysis (GPCA) have proven less 
effective in our experiments. 

3.4' The Locally Linear Embedding Clustering Algorithm 

The LLEC algorithm for quantizing the color space of natural imagery 
is summarized in the steps below. First, embed a data set X with points of 
dimension D = 3 into a lower dimension d = 2 using = 4 nearest neigh- 
bors by using LLE. Next, identify the distinct subspaces appearing in the 
reconstruction data, y, of the LLE algorithm by using the process described 
in the previous section. Through this subspace segmentation, determine the 
Voronoi sets, Si^ formed by identifying those points that are 'close enough to' 
each subspace and 'similarly' colored to the point y*. Note that the number 
of distinct Voronoi sets is the number of distinct subspaces. Let's call this 
number S. Then, calculate the mean of the colors, Mi = ^ "^yeSi y ^^di 
set. Finally, identify all points y G 5^^ by the prototype fii. Note that each 
y- in the reconstruction data corresponds to a unique in the original data 
space, so this determines a clustering of the data set X. 

4. Implementation 

As indicated in ^22] a major complication in color space quantization 
often relates to varying shades of a given color due to illumination. We have 
observed that the LLEC algorithm handles this illumination component by 
identifying various shades of a given hue as a unique subspace and all pixels 
that are elements of this subspace can be identified together. 

At this time, the LLEC algorithm is not a fast algorithm as the procedure 
for performing LLE and the search to determine those points that are 'close 
enough to' each subspace and 'similarly' colored to the random point being 
considered are computationally intensive. We will see however that LLEC 
does an excellent job of quantizing the color space of natural imagery. A 
benefit of LLEC is that the only free parameters in the algorithm are ci and 
€2, the tolerances which can be specified by the user to refiect the desired 
accuracy of the quantization. The value of LLEC then is that it can be 
implemented on an image to determine the natural subspaces of the color 
space. The knowledge obtained by segmenting these subspaces can then 
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be used in conjunction with other clustering algorithms such as the Linde- 
Buzo-Gray (LBG) [2JJ vector quantization algorithm to identify the starting 
centers as the subspaces unveiled in the LLEC algorithm. We will see this 
applied shortly. 

4.1. LLEC used to Quantize Color Space 

First, let's consider the ability of the LLEC algorithm to quantize the 
color space of a variety of images. In each of these examples, the images 
were processed using MATLAB. Each sheet pertaining to the red, green, 
or blue component of pixels within an image was converted from a matrix 
of dimension equal to the resolution of each image to a long row vector of 
dimension 1 x p where p is the number of pixels in the image. A new matrix, 
X, of dimension 3 x p was created to contain all of the data entries of these 
long row vectors. By organizing the data this way, we see that each column 
of the matrix corresponds to the RGB components of an individual pixel 
which is a data point to be analyzed. We have chosen to use the Euclidean 
metric to calculate distance-a measure of proximity-between points. 

Let's first consider LLEC's ability to segment the color space of a natural 
image and then use this segmentation to quantize the color space. We high- 
light in Figure [6] LLEC's ability to segment the subspaces in the 2-dimensional 
plot using accuracy tolerances of approximately 0.4 for a sample image. In 
Figure [7[ we observe the color quantizations obtained for this image as well 
as others using various accuracy tolerances. Note that in Figure [7[ each of 
the original images were of resolution 100 x 100 or less. We see that as 61 and 
62 decrease, the reconstructions become better representations of the original 
images. 

4.2. LLEC Implemented with LBC 

Let's now see how LLEC can be implemented in conjunction with another 
clustering algorithm on a class of large images. These images are of a sub- 
alpine meadow near the Rocky Mountain Biological Laboratory in Gothic, 
Colorado provided by Dr. David Inouye of the University of Maryland. Each 
image is of resolution 2592 x 3872 which generates 10,036,244 pixels. We can- 
not implement the LLEC algorithm directly on images from this landscape 
data set as its implementation requires constructing a pixel by pixel matrix. 
We have chosen to implement LLEC in conjuction with the Linde-Buzo-Gray 
algorithm [25] on a set of these images. The simplicity of the LBG al- 
gorithm makes it desirable, but other clustering algorithms such as those 
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discussed in [30], [I9], [25], [27], etc. could be used alternatively. The LBG 
algorithm is an iterative competitive learning algorithm that, in essence, de- 
termines all points that fall within a Voronoi region around specified center 
vectors, calculates the mean of all points within this region, updates the 
center of this set to be equal to the mean, and then iterates the process 
until a fixed number of iterations has been met or some stopping criteria is 
achieved. Proper initialization is a crucial issue for any iterative algorithm 
and can greatly aflFect the outcome. Therefore, we have chosen four diflFerent 
methods to initialize the center vectors for comparison purposes. 

The first method chosen to determine the center vectors used in the iter- 
ative LBG algorithm is LLEC. Here we use the LLEC algorithm to create a 
palette of colors for the data to be clustered around by identifying the natu- 
ral subspaces of subimages of images within the landscape data set, namely 
Figures [7i| |7m| and [7q| The benefit of this method is that all subspaces are 
identified in an unsupervised manner. 

The next method involves choosing the eight three dimensional data 
points with components either or 255 and 17 other data points sampled 
near the green, yellow, blue, white, and black colors as centers. This requires 
supervision from the user to identify which colors seem to predominantly 
appear in the data set of images. 

The third method involves choosing 25 random centers. That is 25 data 
points, [r, b] are chosen such that r^g^b G [0, 255]. Choosing random centers 
in this way does not guarantee that any of the colors identified to be centers 
will be similar to colors that appear in the actual image, and thus, many of 
the centers could be potentially unused in the clustering. 

The final method involves choosing 25 random centers from the data set. 
That is, choose 25 columns of the data matrix X randomly to be the centers 
that the data points are clustered around. The benefit of this method is that 
this is the only approach that identifies actual points within the data set as 
centers. However, not all natural subspaces may be represented as we will 
observe shortly. 

Figure [8] reveals the performance of each method on one sample image 
from the landscape data set. In several implementations on various images 
within the landscape data set, we have observed similar results. It appears 
that all methods for determining the centers result in fairly accurate re- 
construction images. However, we have observed in practice that the two 
methods of using the LLEC algorithm to determine centers and identifying 
random centers within the data set tend to result in the lowest distortion 
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errors as calculated by 



where X is a data set consisting of p points with regard to a set of centers 
labeled by indices J . Note that if we let X* indicate the matrix of points each 
identified with the centroid of the Voronoi region that each point is assigned 
to, then the distortion error could also be calculated as \\X — where 



We notice in Figure [9] however that LLEC gives a better reconstruction 
visually. Observe that the reconstruction obtained by identifying centers as 
random points within the data often does not capture all subspaces within 
the image. In particular, the yellow fiowers in the second example and the 
blue fiowers in the third example do not appear in the reconstruction. Thus, 
it appears that LLEC used in conjunction with LBG is able to reconstruct 
the image with minimal error and the most accurate representation visually. 

5. Conclusion 

In this work, we have presented a novel algorithm, LLEC, to cluster and 
segment the color space of natural imagery. Within this algorithm, is a 
method to reconnect artificially disconnected components (resulting from a 
choice of k nearest neighbors) as well as a technique for one dimensional sub- 
space segmentation that can be extended to multi-dimensional segmentation 
which is robust in the presence of noise. We have seen that LLEC does an 
excellent job of quantizing the color space of imagery with the only input 
parameters directly related to the accuracy of the quantization. However, 
LLEC does have some limitations. As already mentioned, LLEC is com- 
putationally intensive. Also, in the formulation of the LLE algorithm, it is 
required to create matrices of size p x where p is the number of pixels in 
the image. For large images, this may require a prohibitively large amount of 
memory. Thus, LLE and LLEC, in turn, perform well on small sized images 
when being implemented in this manner. However, if techniques such as the 
sampling methods discussed in jT5], jSTj or the stitching method as discussed 
in [11] are implemented, this limitation may be alleviated. Even with these 
limitations, we see that LLEC is useful in identifying the natural subspaces 
within an image. 




.op is the Frobenius norm. 
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Appendix A. Locally Linear Embedding 

Appendix A.l. Nearest Neighbor Search 

The first step in implementing the LLE algorithm is to determine the 
neighbors associated to each of the high dimensional data points. Determin- 
ing the nearest neighbors of a specific data point involves finding those data 
points that are the most similar. One way to measure similarity is to use a 
Euclidean distance metric (however, other metrics may also be used). The 
most straightforward way to perform this task is to determine a fixed number 
of nearest neighbors by considering those data points with the smallest dis- 
tance determined by the metric being used. Alternatively nearest neighbors 
can be identified by classifying as neighbors all data points that fall within a 
ball of fixed radius about each point. Therefore, the number of nearest neigh- 
bors could diflFer for each data point. In this paper, the nearest neighbors of 
each point was found by determining a fixed number, fc, of data points with 
the smallest non-zero Euclidean distance from the original point. 

Determining the fixed number of neighbors, fc, is the only free parameter 
in the LLE algorithm. There is some art in choosing an appropriate value for 
k. If the manifold is well-sampled, then each data point and its nearest neigh- 
bors lie approximately on a locally linear piece of the manifold. However, if 
the number of nearest neighbors, fc, is chosen to be too large, the region may 
no longer be local and might include points which are geodesically far away. 
The span of a set of k points is a linear space of dimension at most k — 1. 
Therefore, the dimension of the target vector space, rf, should be chosen to 
be strictly less than the number of nearest neighbors. However, choosing k 
to be too small may be problematic as the eigenvector problem to determine 
the embedding vectors becomes singular. Note that ||3j did not give guidance 
on how to choose an appropriate number of nearest neighbors. However, [32j 
gives an hierarchical approach to automatically select an optimal parameter 
value which has been shown to be quite precise. 

In the situation where the original dimension of the data, is fairly 
low, it is often necessary to choose the number of neighbors, fc, to be greater 
than this dimension to avoid the eigenvector problem becoming singular. If 
k > then the set of nearest neighbors, A^^, of data point Xi is no longer 
linearly independent, and thus, there is not a unique solution for determin- 
ing the reconstruction weights. In this case, the covariance matrix C defined 
below becomes singular or nearly singular. A regularization must be imple- 
mented in order to suspend this breaking down of the algorithm. One such 
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regularization would be to add a small multiple of the identity to the covari- 
ance matrix which, in turn, corrects the sum of the squares of the weights 
so that the weights favor a uniform distribution. The optimization problem 
then finds the set of weights that come closest to the point representing uni- 
form distribution of magnitude for each of the weights In this paper, the 
regularization that is used is 

C ^ C + / * to/ * Tr{C) 

where tol is a tolerance that is sufficiently small, usually 0.001, and Tr{C) 
denotes the trace of C . This regularizer is sufficent to make the covariance 
matrix well-conditioned allowing one to determine a unique solution to the 
optimization problem to determine the weights. 

Appendix A. 2. Least Squares Problem to Find Weights 

The second step of the LLE algorithm is to determine the weights used 
to associate each point with its nearest neighbors. This can be done by 
minimizing the distance between a point and a linear combination of all of 
its nearest neighbors where the coefficients of this linear combination are 
defined by the weights. Let A^^ be the set of neighbors associated to a single 
point x^, let p be the number of data points being considered, and let D 
denote the dimension of the ambient space of the data. Our goal then is 
to determine the weights, Wij^ associated to each point, x^, and each of its 
nearest neighbors, Xj G A^^. Note that the weight, Wij^ between two points 
that are not nearest neighbors is defined to be 0. Thus, a data point can only 
be reconstructed from points determined to be its nearest neighbors. Now, 
the weights are determined by minimizing differences between each point and 
a linear combination of the nearest neighbors to the point. Let W denote the 
matrix of weights with entries Wij. The cost function of the reconstruction 
error is then given by 

p 

i=i jeNi 

For each i, a constraint, Wij = 1, is implemented to ensure that these 
weights are invariant to translations. Note that the form of the errors ensures 
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the weights are also invariant to rescahngs and rotations. Using the sum-to- 
one constraint, the constraint that Wij = if Xj is not in the set Ni^ and a 
httle hnear algebra, we see that 



e{W) = EN-E 
where c^-^ = (x^ - x^-)^(x^ - x^,). 

Now, we want to minimize these errors using the constraint Wij — 1. This 
can be done using Lagrange Multipliers. Fixing z, we have 

min WjWkCjk A j Wj — 1 
j,keNi \jeNi 

This optimization problem can be solved by finding the critical values of this 
cost function which results in solving the following system of equations 




which yields 

Cw = e 

where C corresponds to the covariance matrix determined by c^^ = (x^ — 
Xj)^{xi — Xfc), w is the column vector of weights associated to a single point, 
and e is the vector of all ones. Thus in order to find the reconstruction 
weights, it is only necessary to solve 

where the weights are rescaled so that they sum to one. Thus, we have 
derived the least squares problem to determine the weights that reconstruct 
the high dimensional data points of dimension D to the lower dimension 
embedding data points of dimension d. We can form a weight matrix, W ^ 
where each row, i, corresponds to the weights between the point x^ and every 
other point. Note that W is extremely sparse as the weight between any two 
points that are not nearest neighbors is defined to be zero. 
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Appendix A. 3. Eigenvector Problem 

The third and final step of the LLE algorithm is to determine the low di- 
mensional embedding vectors, y^, of dimension d by using the reconstruction 
weights, Wij^ of the high dimensional data vectors, x^. The only informa- 
tion used in this portion of the algorithm is the geometry obtained by the 
weights. A cost function for the errors between the reconstruction weights 
and the outputs, y^, is minimized as follows: 

i=l j=l 

Here Y denotes the d x p matrix of embedding vectors. In order to find 
these reconstruction vectors, y^, the following optimization problem must be 
solved for fixed weights, Wij. Using linear algebra, we can manipulate our 
cost function to obtain 

p p 

1=1 3=1 

P 

where < *, * > is the standard Euclidean inner product and Mij — Sij — Wij — 

p 

'^/cz'^/cj- Note that all of the Wij are entries of the weight matrix W. 

k=i 

Thus, 

M^I-W-W^ + W^W = (/ - Wf (/ - W) 

We see that M is symmetric even though Wij is not necessarily equal to Wji. 
In addition, M is extremely sparse and is positive semi-definite. 
It is straightforward to show that 

p 



0(y) = V (y„ y^.) = triYMY"-) 



where Y corresponds to the matrix of the embedding vectors. Our problem 
then becomes 

mmtr{YMY^) 
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subject to YY^ = I 

where the constraint is equivalent to saying that the embedding vectors are 
sphered or whitened. Thus, they are uncorrelated, and their variances equal 
unity. Note that Y is orthogonal in the row space but the embedding vectors 
are not required to be orthogonal. This constraint does not change the prob- 
lem since the reconstruction error is invariant under rotations and rescalings. 
Otherwise, letting = for each i would be the optimal solution. We also 
use the fact that translations do not affect the cost function, so we require 
the outputs to be centered at the origin adding the constraint: 

p 

i=l 

We will again use Lagrange multipliers to solve this problem. Our La- 
grangian becomes 

d 

L{Y, fi) = tr{YMY^) - ^^^Ayy^ " 

where each /x^j is the Lagrange multiplier for each constraint. Taking the 
derivative of this Lagrangian with respect to the matrix Y and equating 
to zero will yield our desired solution. We see that 2YM = 2Ay, where 
Lambda is the diagonal matrix of Lagrange multipliers, is the solution to our 
Langrangian, and it can be manipulated so that MY^ = Y^A. 

Thus, Y^ is the matrix of eigenvectors of M, and A is the correspond- 
ing diagonal matrix of eigenvalues. The optimal embedding up to rotations, 
translations, and rescalings of the embedding space can then be found by 
solving this eigenvector problem. The Rayleigh-Ritz theorem as described 
in [33] gives an indication of which eigenvectors actually solve the problem. 
Using this, we need to obtain the bottom d + 1 eigenvectors of the matrix, 
M, (those eigenvectors corresponding to the smallest eigenvalues in increas- 
ing order). We will see shortly that the eigenvector corresponding to the 
smallest eigenvalue is the unit vector with all equal components correspond- 
ing to the mean of the data. We discard this eigenvector, leaving the second 
through the d + 1 eigenvectors. Thus, the embedding vectors that solve the 
LLE algorithm are these d remaining eigenvectors. When discarding the bot- 
tom eigenvector, it forces each of the other eigenvectors to sum to zero by 
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orthogonality enforcing the constraint 



i=l 



which requires that the embedding vectors have zero mean. 

In order for there to exist a unit eigenvector with aU equal components 
as described above, each row of the M matrix must sum to a scalar, A. In 
fact, we can show that each row has zero sum. Given 



Mij 6ij - Wij - Wji + ^ WkiWkj 



k=i 

we can see 

p p p p p p 

j=l j=l j=l j=l j=l k=l 

p p p 

3=1 3=1 k=l 

p p 

= - ^Wji + ^ Wki 

3=1 k=l 

= 

Thus, there exists a unit eigenvector of all equal components corresponding 
to the eigenvalue zero which we may discard as described above. See Section 



2.4| for a discussion on data sets that have more than one eigenvalue equal 
to zero. 

The third step of the LLE algorithm involves solving this eigenvector 
problem to determine the unique solution Y. The solutions to the problem 
are the rf-dimensional columns of the Y matrix where each column, j, of 

Y corresponds to column, of X, and each row of Y is an eigenvector of 
the matrix M. Note that although each weight was determined locally by 
reconstructing a data point by its nearest neighbors, the optimal embedding 

Y was determined by a p x p eigensolver which is a global undertaking that 
uses the information from all points. Therefore, through the LLE algorithm 
we were able to obtain low-dimensional embedding vectors that preserve 
the local topology of a high-dimensional data set by determining a global 
coordinate system. 
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Appendix B. Initializing Subspace Segmentation 

Subspace segmentation discussed in this paper is initialized by selecting 
a point, y*, and then clustering around this data point. If y* is selected 
randomly, then there is an element of randomness in the algorithm, allowing 
for a variety of subspaces to be determined, depending on each y* selected. 
Here we will discuss three non-random approaches that select points reflecting 
the data density. 

One method is to select y* to be the point within the data set that has 
the most points falling within an epsilon ball of the point. Another method is 
to select y* to be the point whose b^^ nearest neighbor is closer to it than any 
other point within the data set. Here b G an arbitrary number. While the 
first idea is computationally intensive, the second has the complication that 
points are being removed from the data set, so there may occur a moment 
in the algorithm where the number of data points p < b. In this case, we 
adjust 6 to be a number less than the number of data points. For instance, 
b — [|] , where [*] is the ceiling function, works well in practice. 

An alternate approach for selecting the point y* searches for the point 
that falls on a subspace which most reflects a linear structure. This can 
be implemented as follows. Determine an epsilon ball around each point. 
Compute the singular value decomposition of a matrix formed by the points 
in each of these epsilon balls in order to find the singular vectors and singular 
values. Choose y* to be the random point with the smallest ratio of singular 
values ^ as this reflects the subspace with the most linear structure. While 
this approach chooses points falling along structures most easily identified 
as 'linear', one downside is that it is computationally intensive. It typically 
produces reconstructions with a smaller distortion error than all of the other 
methods described above, but it does so by identifying more subspaces. 

The various methods for identifying y* have features that make each of 
them attractive, depending on the user's desired result. In this paper, we have 
chosen to follow the method that chooses y* as a point in a dense region of 
the data by finding the b^^ nearest neighbor with the smallest distance. This 
is the least computationally intensive and produces reconstructions with a 
relatively small distortion error using relatively few subspaces to reconstruct. 
Here we have selected & = 50. 
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Figure 7: Reconstruction images of LLEC with variances tolerances. Column 
two has tolerance ei = 62 = 0.6. Column three has tolerance ei = 62 = 0.4. 
Column four has tolerance ei = 62 = 0.2. S denotes the number of distinct 
subspaces in a reconstruction and DE denotes the distortion error. 
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(a) Original 




(b) LLEC (c) Identifying Centers 




(d) Random Centers (e) Random Centers from Data 

Figure 8: Reconstruction images after quantizing the color space of original 
image with LBG using indicated method to determine the centers. Note that 
the respective distortion errors of each implementation with 15 iterations are: 
140.0250, 342.6351, 219.0756, and 146.7013. 
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(a) 1st Original 



(b) LLEC 



(c) Random from Data 






(d) 2nd Original 



(e) LLEC 



(f) Random from Data 





(g) 3rd Original 



(h) LLEC 



(i) Random from Data 



Figure 9: Quantizing the color space of the original image with LBG using 
indicated method to determine the centers. Note that the respective distor- 
tion errors of these two implementations with 15 iterations are: (1st Original) 
210.3490 and 210.6900, (2nd Original) 140.0250 and 146.7013, (3rd Original) 
172.5580 and 170.7743. 



30 



